Choiseul
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
Abdallah, Abdelrahman, Mozafari, Jamshid, Piryani, Bhawna, Jatowt, Adam
Retrieval-Augmented Generation (RAG) models have drawn considerable attention in modern open-domain question answering. The effectiveness of RAG depends on the quality of the top retrieved documents. However, conventional retrieval methods sometimes fail to rank the most relevant documents at the top. In this paper, we introduce ASRank, a new re-ranking method based on scoring retrieved documents using zero-shot answer scent which relies on a pre-trained large language model to compute the likelihood of the document-derived answers aligning with the answer scent. Our approach demonstrates marked improvements across several datasets, including NQ, TriviaQA, WebQA, ArchivalQA, HotpotQA, and Entity Questions. Notably, ASRank increases Top-1 retrieval accuracy on NQ from $19.2\%$ to $46.5\%$ for MSS and $22.1\%$ to $47.3\%$ for BM25. It also shows strong retrieval performance on several datasets compared to state-of-the-art methods (47.3 Top-1 by ASRank vs 35.4 by UPR by BM25).
Surrealistic-like Image Generation with Vision-Language Models
Ayten, Elif, Wang, Shuai, Snoep, Hjalmar
Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
Truong, Sang T., Nguyen, Duc Q., Nguyen, Toan, Le, Dong D., Truong, Nhi N., Quan, Tho, Koyejo, Sanmi
We employ Large language models (LLMs) such as GPT-fine-tuning on the LLaMa-2, Mixtral 8 7B, 4 (OpenAI, 2023), BLOOM (Le Scao et al, Gemma, and conduct a comprehensive evaluation 2023), LLaMa-2 (Touvron et al, 2023), Mistral of Vietnamese LLMs across various scenarios and (Jiang et al., 2023), Mixtral (Jiang et al., 2024), settings. Throughout the thorough evaluation process, Gemma (Team et al., 2024) have made significant we observe the following: (i) larger language contributions to the field of natural language processing models exhibit unseen capabilities compared to (NLP). Despite their advancements, a gap smaller counterparts; (ii) larger language models remains in their specialization for many languages, tend to manifest more biases, produce uncalibrated including Vietnamese. This paper addresses the results, and are more susceptible to the influence development and evaluation of Vietnamese-centric of input prompts; (iii) the quality of training or LLMs. Vietnam, with a population surpassing 100 fine-tuning datasets is the key for unlocking LLM million, ranks as the 16th most populous country performance. Our key contributions include: globally.